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Abstract. Let T be a self-adjoint operator on a finite dimensional Hilbert space. It is shown 
that the distribution of the eigenvalues of a compression of T to a subspace of a given dimension 
is almost the same for almost all subspaces. This is a coordinate-free analogue of a recent result 
of Chatterjee and Ledoux on principal submatrices. The proof is based on measure concentration 
and entropy techniques, and the result improves on some aspects of the result of Chatterjee and 
Ledoux. 



1. Introduction 

Let T be an operator on a (real or complex) n-dimensional Hilbert space IK, and let C ;K 
be a subspace. The compression of T to is the operator = t^eT\e = '^eTtt'^ on E, where 
tte :?{—)• is the orthogonal projection. The spectral distribution of a self-adjoint operator T is 
the probability measure on M 



1 " 
n ^-^ 



(T), 



where Ai(T) > • • • > A„(r) are the eigenvalues of T, counted with multiplicity. 

The following result shows that for 1 < k < n and a self-adjoint operator T on an n-dimensional 
Hilbert space Ji, the empirical spectral distribution of the compression is almost the same for 
almost every A:-dimensional subspace E <Z "K. The notations ak and p are explained after the 
statement of the theorem; di denotes the Kantorovich-Rubinstein metric on probability measures, 
also defined below. 

Theorem 1. Let "K be an n-dimensional Hilbert space, T a self-adjoint operator on "K, and 
1 < k < n. Let E be a k-dimensional subspace of "K chosen at random with respect to the rota- 
tionally invariant probability measure on the Grassmann manifold. Let fiE be the empirical spectral 
distribution of the compression ofT to E, and let fi = E^^;. Then 



(1) 
and 

(2) 



EdiinE,^) < ci 



a,(r)4/V(r)3/7 



(fen)2/7 



di{iiE,lj) > ci 



(T)4/V(r)3/7 



(A;n)2/7 



+ t 



< C2 exp 



kn 



-C3 



for every t > 0, where ci,C2,C3 > are absolute (computable) constants. 

Here p{T) = ^(Ai — A„) denotes one half the spectral diameter of T (which is different in general 
from the classical spectral radius); it is easy to check that p{T) is the distance of T from the space 
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of real scalar operators with respect to the operator norm. For 1 < A; < n, 

crfe(r) = inf 



\ i=i 



where si > • • • > > denote singular values. That is, crk(T) is the distance of T from the space 

of real scalar operators with respect to the norm 2 = \/j2i=i Si{T)'^. 

The space of probability measures (with finite first moment) on M is equipped with the Kantorovich- 
Rubinstein or L^-Wasserstein distance di, which may be equivalently defined in the following three 
ways: 



(3) 



= inf / \x-y\ dTi{x,y) = sup { f dfi - f 



dv 



Here tt varies over all probability measures on M x M with marginals /x and v] f varies over all 
Lipschitz continuous functions R — t- M with Lipschitz constant at most 1; and F^, Fi, are the 
cumulative distribution functions of 1^. All three characterizations will be used in this note; for 
the equalities see [lOl Chapter 1]. 

Theorem [1] is a coordinate- free analogue of a recent result of Chatterjee and Ledoux [3], which 
considered the empirical spectral measure of a random k x k principal submatrix of a fixed n x n 
Hermitian matrix. The approach taken in [3] is rather different than the one taken here; the result 
of [3] is also given in terms of the Kolmogorov distance between measures, rather than Wasserstein 
distance. See section [3] below for a more detailed comparison of the results. 

2. Proof of Theorem [T] 

Throughout this section let !K and T be fixed, and let fiE and fi be as defined in the statement 
of the theorem. For brevity we write Ufc = cTkiT) and p = p{T). The notation < B means < cB, 
where c > is some absolute constant. 

Recall that the Grassmann manifold Gfc(IK) of /c-dimensional subspaces of IK is equipped with 
the metric 



d{E, F) = inf , 



Jl\\ ; 



where the infimum is over all orthonormal bases {ei, . . . , e^} and {/i, • • • , /fc} of £' and F respec- 
tively. 

Lemma 2. For any E,F e Gk{^), di{pE,k^F) < ^d{E,F). 

Proof. Define a coupling vr of pE and pF hy 7: = ^ Yli=i \\i{TE)MTp))- Then 

k 

k 



d 



k 



i=l 



1 " 



i=l 



Now if {ei, . . . , Ck} and {/i, . . . , fk} are orthonormal bases of E and F, then the matrices of Te 
and Te with respect to these bases are [ (T(ej), Cj) ]^ [{T{fj),fi)~\._._-^ respectively. As a 
consequence of Lidskii's theorem (see [H § III.4]), for any k x k Hermitian matrices A and B, 



\ i=i 



HS 



^ij ^ij I 
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Thus by the self-adjointness of T and the Cauchy-Schwarz inequahty, 
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^ 2d{E,F) 
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Observing that di{fiE, fJ'p) is invariant under addition of a real scalar matrix to T, the lemma is 
proved. □ 

The same proof as above can be carried out (and is slightly simpler) with the Kantorovich- 
Rubinstein distance replaced by the L^-Wasserstein distance, although this observation will not be 
used here. 

The following concentration inequality goes back to Gromov and Milman [5]; see also section 2.1 
of [6] where it is pointed out explicitly that the same result applies in the complex case. 

Theorem 3. Let f : Gfc(^K) — )■ M 6e 1-Lipschitz with respect to the metric d on Gfc(IK), and let 
E G Gfc(!K) he distributed according to the rotation-invariant probability measure on Gfc(IK). Then 

F[\f{E)-Ef{E)\>t]<exp[-cnt'] 

for t > 0, where c > is an absolute constant. 

Observe that ([T]) , Lemma [H and Theorem [3] together imply ([2]) , so it suffices now to prove ([1]) . 

Let E G Gk be distributed according to the rotation- invariant probability measure on (^K) . 
For a given function / : M — ?■ M, define the random variable Xf = J fdfiE — J fdfi. By Lemma [2] 
and Theorem [3l for functions f,g 



(4) 



/ 



Xg\>t\ 



>t] <exp 



kn 



|2 

9\l 



logN{J,\\-\\' ,e) de, 



for t > 0, where |/|^ denotes the Lipschitz constant of /. 

The inequality ^ shows that the random process Xf, indexed by some family 3" of Lipschitz 
continuous test functions (to be determined), satisfies a subgaussian increment condition with 
respect to the norm ||-||' = \\-\\c^ °^ 3" (here, ||/||ci := max{||/||oo, ||/'||oo}, so that for / G C'^, 
|/|l < ll/IIcO- This raises the possibility to estimate its expected supremum by Dudley's entropy 
bound 0] (see also [5]): 

(5) IEsupX^< 

where iV(3", \\-\\ ,e) is minimum number of sets of diameter e with respect to ||-|| needed to cover 
3'. Since fiE and fi are supported on [A„,Ai], 

di{iSE,f^)= sup {Xf : l/l^ < 1} =sup{X; : |/|^ < 1, ||/||^ < 2p}. 

Thus to prove ([TJ, it suffices to estimate E supjgg- Xf for 3" = {/ : < 1 + 2p}. However, since 

is an infinite dimensional function space, for this choice of 3" the covering numbers A^(3", || -H' , e) 
in ([5]) will always be infinite for small e. 
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Instead, define 3" = {/ : ||/||c.2 < 1}, where ||/||c2 := max{||/||oo, ||/'||oo, ll/"l|oo}- The covering 
numbers N{3', , e) can be estimated using the methods of [9l § 2.7]; see [Tj for explicit estimates 
which, combined with ^ and a linear change of variables, yield 



(6) 



EsupjX^ : 11/11^. <1}<^ f 

Vkn Jo 



l + logi + i(p+i) de< 



akx/p + 1 



kn 



The bound ([T]) is now derived from ([6]) via a smoothing and scaling argument. Fix / : M — )■ M 
with \f\i < 1 and < 2p. Let (/? : R — )■ M be a smooth probability density with finite first 

absolute moment and G For t > define ^t{x) = j(p{^), and let gt = f * <^t- Then 



(7) \\9t\ 



< 



\^t\\i < 2p, 



\9t 



Now for any probability measure i' on 



f diy- gt du 



< I/Il llv^tlli < 1, lift II, 

[fix) - f{x - y)](p{y) dy dv{x) 



< I/Il Iktlli ^ J- 



<t. 



Thus 



\Xf\ < 



gt dm 



+ 



and so by ([6]) and d?]), 



C2 



gt d^i 



gt dfi 



+ 



9t dfi- f d^i 



l\ CFkyJp + 1 



'kn 



Ec^i (me, ^) <t + ( 1 + 2/9 + 
Picking t of the order ^"|°fc^)i/4 — yields 

Now apply ([5]) with the operator T replaced by sT for s > 0. It is easy to check that the 
Kantorovich-Rubinstein distance di{pE,p) is homogeneous with respect to this rescaling, as are ak 
and p. Thus one obtains 



a^(p + l)V4 ^,(^ + 1)3/2 



Edi{pE,p) 



< 



1 f ./^(sp+l)^/^ sak{sp+lj^/ 



(A;n)V4 



+ 



'kn 



Picking s of the order -^yjy — r yields ([I]) . 



□ 



3. Discussion 



In [3], Chatterjee and Ledoux proved a version of Theorem [T] for principal submatrices. Namely, 
let = C", and suppose E is now uniformly distributed among /c-dimensional coordinate subspaces 
of C". Then [3] shows that 

(9) F[d^{pE,p) > k-'/^ + t] < uVke-'^ 

for t > 0, and consequently 

13 + \/81ogA; 



(10) 



Edoo{pE,p) < 



Vk 



Here doo{p, z^) = — is the Kolmogorov distance between probability measures p and v on 
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It is likely that the methods of this paper could be used to prove a result in the setting of [3], by 
replacing Theorem[3l which follows from concentration inequalities on the unitary or special orthog- 
onal group, with an appropriate concentration inequality on the symmetric group Furthermore, 
it may be possible to prove a result in the setting of this paper using methods related to those of 
[3], such as adapting the approach of Chatterjee in [2]. Below some quantitative comparison will be 
offered between Theorem [1] and the result of [3], ignoring the fact that the random subspace E has 
a different distribution in each setting. In particular, the distribution of E is probably responsible 
for the difference between the subexponential tail decay in ([9]) and the subgaussian tail decay in 

Before discussing more specific quantitative comparisons, we note that the clearest difference 
between the two results is that ours is coordinate-free. While there are settings in which coordinates 
have meaning and thus coordinate-oriented results are natural, there are many settings in which 
there is no clearly preferred basis in which to view an operator. Take, for example, the Laplacian 
A on the sphere It has eigenvalues (up to sign convention) > — Ai > — A2 >•••—>• —00, 

and the corresponding eigenspaces are multidimensional. If one took to be the span of the first 
m eigenspaces, with T = A|j{, there is no canonical choice of basis within each eigenspace, and so 
it would seem more natural to consider compressions of T to all subspaces of a given dimension, 
rather than only to the coordinate subspaces for some choice of basis. 

Comparisons of the results are made somewhat difficult as the Kantorovich-Rubinstein distance 
di and the Kolmogorov distance doo are not comparable in general. However, since the measures 
here are all supported in the interval [\n{T), Ai(T)], from the third representation of di in ([3]) one 
obtains the estimate 

(11) di{fiE,IJ') < 2p(r)doo(/i£;,/u) 

in the present context. This estimate is related to a qualitative difference between di and doo- 
whereas di is homogeneous with respect to a rescaling of the supports of measures (a fact which 
was exploited in the proof of Theorem [T]), doo is invariant under rescaling. Which behavior is more 
convenient may vary by the context. 

Inequality (jlip makes some quantitative comparisons between the results of [3j and Theorem [T] 
possible. Observe that Q and (jlOp only yield nontrivial information if k ^ 1 (which of course 
requires n ^ 1), whereas under appropriate scaling. Theorem [T] is nontrivial for n ^ 1 even if k 
is small. In particular, ([9]) and (fTTI) imply that the fluctuations of di(/i£;,/x) above its mean are 
of order (ignoring logarithmic factors) at most A;~^/^p(T), whereas ([2]) together with the general 
estimate crfc(T) < \/kp{T) yields fluctuations of order at most {kn)~^/'^ak{T) < n^^/'^ p{T). 

The issue of the expected distance is more complicated. The general estimate p{T) < (Tk[T) and 
inequalities ([10]) and pT]) imply that 



(12) E(ii(^£;,/i) < c -j= <c -j= , 

which is slightly weaker than ([T|) for k large (in which case the lossy estimates used to arrive at (jl2p 
mean that the comparison should probably not be taken too seriously) and significantly weaker for 
k small. Since the different distributions of E are being ignored here there is little point in making 
the comparison very precise. 

Finally, the comparison of fluctuations highlights that the methods of this paper are more sen- 
sitive to the proximity of T to the space of scalar operators. If T is a (real) scalar operator then 
is a constant point mass, so it is natural to expect that if T is nearly scalar in some sense then 
Pe will be more tightly concentrated then in general. The results of [3j do not directly reflect this 
at all, although the estimate (jlip allows one to insert this effect by hand when changing metrics. 
However, (Tk{T) provides a sharper measure than p{T) of how close T is to scalar, and in some 
cases the bound {kn)~^^'^ak{T) on the order of the fluctuations may be even much smaller than 
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n ^/'^p{T). This is the case, for example, if T has a large number of tightly clustered eigenvalues 
with a small number of outliers. 
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